Refining phoneme segmentations using speaker-adaptive context dependent boundary models
نویسندگان
چکیده
Consistent phoneme segmentation is essential in building high quality Text-to-Speech (TTS) voice fonts. In this paper we propose to adapt an existing well-trained Context Dependent Boundary Model (CDBM) for refining segment boundaries to a new speaker with limited, manually segmented data. Three adaptation approaches: MLLR, MAP, and a combination of the two, are studied. The combined one, MLLR+MAP, delivers the best boundary refinement performance. In comparison with other boundary segmentation methods, the adapted CDBM yields better results, especially with a limited amount of adaptation data. Given 400 manually segmented boundary tokens in about 20 sentences as a development set, the segmentation precision can reach 90% of human labeled boundaries within a tolerance of 20 ms.
منابع مشابه
Speaker adaptation for context-dependent HMM using spatial relation of both phoneme context hierarchy and speakers
To realize good speaker adaptation for context dependent HMM using small-size training data, reasonable adaptation of unseen models have to be realized using the relation of appeared models and the training data. In the paper, a new speaker adaptation method for context dependent HMM using two spatial constraints is proposed: 1) spatial relation of the phoneme context hierarchical models, and 2...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملSpeaker Independent Phoneme Classification in Continuous Speech
This paper examines statistical models for phoneme classification. We compare the performance of our phoneme classification system using Gaussian mixture (GMM) phoneme models with systems using hidden Markov phoneme models (HMM). Measurements show that our model’s performance is comparable with HMM models in context independent phoneme classification.
متن کاملAutomatic syllable-based phoneme recognition using ESTER Corpus
This paper presents an evaluation of speaker-independent continuous phoneme recognition systems on the French speech database ESTER. The tested systems are syllable-based phoneme recognizers, i.e. they use syllables as basic units together with syllabic bigram language models and HMM topologies adapted to syllables. Once identified, syllables are converted back to phones. In a previous paper, w...
متن کاملPhoneme and sub-phoneme t-normalization for text-dependent speaker recognition
1 Test normalization (T-Norm) is a score normalization technique that is regularly and successfully applied in the context of text-independent speaker recognition. It is less frequently applied, however, to text-dependent or textprompted speaker recognition, mainly because its improvement in this context is more modest. In this paper we present a novel way to improve the performance of T-Norm f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005